18 research outputs found

    CONCEPTION ET MANIPULATION DE BASES DE DONNEES DIMENSIONNELLES À CONTRAINTES

    Get PDF
    This thesis defines a constraint-based model dedicated to multidimensional databases. The defined model represents data through a constellation of facts (subjects of analyse) associated to dimensions (axis of analyse), which are possibly shared. Each dimension is organised according to several hierarchies (views of analyse) integrating several levels of data granularity. In order to insure data consistency, 5 semantic constraints (exclusion, inclusion, partition, simultaneity, totality) are introduced, which can be intra-dimension or inter-dimensions. The intra-dimension constraints allow the expression of constraints between hierarchies within a same dimension whereas the inter-dimensions constraints focus on hierarchies of distinct dimensions. The repercussions of these constraints on multidimensional manipulations are studied and OLAP operator extensions are provided.L'accroissement du volume de donnĂ©es dans les systĂšmes d'information est de nos jours une rĂ©alitĂ© Ă  laquelle chaque entreprise doit faire face. Notamment, elle doit permettre Ă  ses responsables de dĂ©celer les informations pertinentes afin de prendre les bonnes dĂ©cisions dans les plus brefs dĂ©lais. Les systĂšmes dĂ©cisionnels rĂ©pondent Ă  ces besoins en proposant des modĂšles et des techniques de manipulation des donnĂ©es. Dans le cadre de ces systĂšmes, mes travaux de thĂšse consistent Ă  Ă©tudier la modĂ©lisation des donnĂ©es dĂ©cisionnelles et Ă  proposer un langage de manipulation adaptĂ©. Dans un premier temps, nous proposons un modĂšle dimensionnel organisant les donnĂ©es en une constellation de faits (sujets d'analyse) associĂ©s Ă  des dimensions (axes d'analyse) pouvant ĂȘtre partagĂ©es. Notre modĂšle assure une plus grande cohĂ©rence des donnĂ©es par sa propriĂ©tĂ© de multi instanciations qui permet de spĂ©cifier des conditions d'appartenance des instances des dimensions aux hiĂ©rarchies. De plus, nous avons dĂ©fini des contraintes exprimant des relations sĂ©mantiques entre les hiĂ©rarchies intra et inter dimensions (Inclusion, Exclusion, TotalitĂ©, Partition, SimultanĂ©itĂ©). Au niveau de la manipulation des donnĂ©es, nous avons redĂ©fini les opĂ©rateurs dimensionnels afin de permettre Ă  l'utilisateur de mieux dĂ©finir ses besoins en prĂ©cisant l'ensemble des instances Ă  analyser. Cette extension a permis d'Ă©viter les incohĂ©rences lors de la manipulation des donnĂ©es dimensionnelles. Nous avons Ă©tudiĂ© Ă©galement l'impact de ces contraintes sur l'optimisation des manipulations basĂ©e sur la technique de matĂ©rialisation des vues. La prise en compte des contraintes sĂ©mantiques a permis de supprimer des vues incohĂ©rentes et de rĂ©duire le nombre de vues candidates Ă  la matĂ©rialisation. Dans un second temps, nous proposons un processus de conception d'un schĂ©ma dimensionnel comportant une dĂ©marche descendante, basĂ©e sur les besoins des dĂ©cideurs, et une dĂ©marche ascendante basĂ©e sur les donnĂ©es sources. Une phase de confrontation, permet d'intĂ©grer les rĂ©sultats des deux dĂ©marches pour obtenir un schĂ©ma dimensionnel en constellation intĂ©grant Ă  la fois les besoins des dĂ©cideurs et les donnĂ©es sources. Afin de valider nos propositions, nous avons dĂ©veloppĂ© un outil d'aide Ă  la conception de schĂ©mas dimensionnels contraints intitulĂ© GMAG (GĂ©nĂ©rateur de MAGasin de donnĂ©es dimensionnelles)

    Contraintes pour modĂšle et langage multidimensionnels

    Get PDF
    National audienceThis paper defines a constraint-based model dedicated to multidimensional databases. The model we define represents data through a constellation of facts (subjects of analyse) associated to dimensions (axis of analyse), which are possibly shared. Each dimension is organised according to several hierarchies (views of analyse) integrating several levels of data granularity. In order to insure data consistency, we introduce 5 semantic constraints (exclusion, inclusion, partition, simultaneity, totality) which can be intra-dimension or inter-dimensions; the intra-dimension constraints allow the expression of constraints between hierarchies within a same dimension whereas the inter-dimensions constraints focus on hierarchies of distinct dimensions. We also study repercussions of these constraints on multidimensional manipulations and we provide extensions of the multidimensional operators

    BigDimETL with NoSQL Database

    Get PDF
    In the last decade, we have witnessed an explosion of data volume available on the Web. This is due to the rapid technological advances with the availability of smart devices and social networks such as Twitter, Facebook, Instagram, etc. Hence, the concept of Big Data was created to face this constant increase. In this context, many domains should take in consideration this growth of data, especially, the Business Intelligence (BI) domain. Where, it is full of important knowledge that is crucial for effective decision making. However, new problems and challenges have appeared for the Decision Support System that must be addressed. Accordingly, the purpose of this paper is to adapt Extract-Transform-Load (ETL) processes with Big Data technologies, in order to support decision-making and knowledge discovery. In this paper, we propose a new approach called Big Dimensional ETL (BigDimETL) dealing with ETL development process and taking into account the Multidimensional structure. In addition, in order to accelerate data handling we used the MapReduce paradigm and Hbase as a distributed storage mechanism that provides data warehousing capabilities. Experimental results show that our ETL operation adaptation can perform well especially with Join operation

    BigDimETL: ETL for multidimensional Big Data

    Get PDF
    International audienceWith the broad range of data available on the World Wide Web and the increasing use of social media such as Facebook, Twitter, YouTube, etc. a “Big Data” notion has emerged. This latter has become an important aspect in nowadays business since it is full of important knowledge that is crucial for effective decision making. However, this kind of data brings with it new problems and challenges for the Decision Support System (DSS) that must be addressed. In this paper, we propose a new approach called BigDimETL (Big Dimensional ETL) that deals with ETL (Extract-Transform-Load) development process. Our approach focuses on integrating Big Data taking into account the MultiDimensional Structure (MDS) through a MapReduce paradigm

    Méthode à base de patterns pour la détection d'anomalies

    Get PDF
    National audienceLa dĂ©tection d’anomalies dans les applications rĂ©elles de distribution de ïŹ‚uide est une tĂąche difïŹcile, en particulier lorsque l’on cherche Ă  dĂ©tecter simultanĂ©ment diffĂ©rents types d’anomalies. La rĂ©solution de ce problĂšme est importante dans plusieurs domaines par exemple, dans les applications de gestion et de supervision de bĂątiments. Dans cet article, nous prĂ©sentons l’algorithme CoRP "Composition of Remarkable Points", une approche conïŹgurable basĂ©e sur la modĂ©lisation de patterns de dĂ©tection simultanĂ©e d’anomalies multiples. CoRP applique un ensemble de patterns, dĂ©ïŹni par l’utilisateur, aïŹn d’annoter (labels) les points remarquables dans une sĂ©rie temporelle uni-variĂ©e, puis dĂ©tecte les anomalies par composition de labels. En comparant avec des algorithmes de la littĂ©rature, notre approche se montre plus robuste et plus prĂ©cise pour dĂ©tecter tous les types d’anomalies observĂ©es dans des dĂ©ploiements rĂ©els. Nos expĂ©rimentations reposent sur des donnĂ©es du monde rĂ©el et des donnĂ©es de benchmark issues de la littĂ©rature

    CoRP: A Pattern-based Anomaly Detection in Time-series

    No full text
    International audienceMonitoring and analyzing sensor networks is essential for exploring energy consumption in smart buildings or cities. However, the data generated by sensors are affected by various types of anomalies and this makes the analysis tasks more complex. Anomaly detection has been used to find anomalous observations from data. In this paper, we propose a Pattern-based method, for anomaly detection in sensor networks, entitled CoRP “Composition of Remarkable Point” to simultaneously detect different types of anomalies. Our method detects remarkable points in time series based on patterns. Then, it detects anomalies through pattern compositions. We compare our approach to the methods of literature and evaluate them through a series of experiments based on real data and data from a benchmark

    Schema-independent Querying for Heterogeneous Collections in NoSQL Document Stores

    No full text
    International audienceNoSQL document stores are well-tailored to efficiently load and manage massive collections of heterogeneous documents without any prior structural validation. However, this flexibility becomes a serious challenge when querying heterogeneous documents, and hence the user has to build complex queries or reformulate existing queries whenever new schemas are introduced in a collection. In this paper we propose a novel approach, based on formal foundations, for building schema-independent queries which are designed to query multi-structured documents. We present a query enrichment mechanism that consults a pre-constructed dictionary. This dictionary binds each possible path in the documents to all its corresponding absolute paths in all the documents. We automate the process of query reformulation via a set of rules that reformulate most document store operators, such as select, project, unnest, aggregate and lookup. We then produce queries across multi-structured documents which are compatible with the native query engine of the underlying document store. To evaluate our approach, we conducted experiments on synthetic datasets. Our results show that the induced overhead can be acceptable when compared to the efforts needed to restructure the data or the time required to execute several queries corresponding to the different schemas inside the collection

    Interrogation de données structurellement hétérogÚnes dans les bases de données orientées documents

    Get PDF
    International audienceLes systĂšmes orientĂ©s documents permettent de stocker tout document, quel que soit leur schĂ©ma. Cette flexibilitĂ© gĂ©nĂšre une potentielle hĂ©tĂ©rogĂ©nĂ©itĂ© des documents qui complexifie leur interrogation car une mĂȘme entitĂ© peut ĂȘtre dĂ©crite selon des schĂ©mas diffĂ©rents. Cet article prĂ©sente une approche d'interrogation transparente des systĂšmes orientĂ©s documents. Pour cela, nous proposons de gĂ©nĂ©rer un dictionnaire de façon automatique lors de l'insertion des documents, et qui associe Ă  chaque attribut tous les chemins permettant d'y accĂ©der. Ce dictionnaire permet de rĂ©Ă©crire la requĂȘte utilisateur Ă  partir de disjonctions de chemins afin de retrouver tous les documents quelles que soient leurs structures. Nos expĂ©rimentations montrent des coĂ»ts d'exĂ©cution de la requĂȘte rĂ©Ă©crite largement acceptables comparĂ©s au coĂ»t d'une requĂȘte sur schĂ©mas homogĂšnes
    corecore